Managing Terabyte-Scale Investigations with Similarity Digests
نویسنده
چکیده
The relentless increase in storage capacity and decrease in storage cost present an escalating challenge for digital forensic investigations – current forensic technologies are not designed to scale to the degree necessary to process the ever increasing volumes of digital evidence. This paper describes a similarity-digest-based approach that scales up the task of finding related digital artifacts in massive data sets. The results show that digests can be generated at rates exceeding those of cryptographic hashes on commodity multi-core computing systems. Also, the querying of the digest of a large (1 TB) target for the (trace) presence of a small file can be completed in less than one second with very high precision and recall rates.
منابع مشابه
Content triage with similarity digests: The M57 case study
In this work we illustrate the use of similarity digests for the purposes of forensic triage. We use a case that consists of 1.5 TB of raw data, including disk images, network captures, RAM snapshots, and USB flash media. We demonstrate that by applying similarity digests in a systematic manner, the scope of examination can be narrowed down within a matter of minutes to hours. In contrast, conv...
متن کاملData Fingerprinting with Similarity Digests
State-of-the-art techniques for data fingerprinting have been based on randomized feature selection pioneered by Rabin in 1981. This paper proposes a new, statistical approach for selecting fingerprinting features. The approach relies on entropy estimates and a sizeable empirical study to pick out the features that are most likely to be unique to a data object and, therefore, least likely to tr...
متن کاملScalable Data Correlation
The fast capacity growth of cheap storage presents an ever-escalating problem for forensic investigations as currently employed forensic technologies are not designed to scale to the degree necessary to meet the challenge. In this work, we present an approach which seeks to scale up the process of finding related digital artifacts across large data sets by employing an advanced version of our s...
متن کاملManaging Natural Language Requirements in Large-Scale Software Development
An increasing number of marketand technology-driven software development companies face the challenge of managing several thousands of requirements written in natural language. The large number of requirements causes bottlenecks in the requirements management process and calls for increased efficiency in requirements engineering. This thesis presents results from empirical investigations of usi...
متن کاملLessons Learned from Managing a Petabyte
The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in “smaller” terascale environments. This paper presents some of these new pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012